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ABSTRACT 

A method to compute the fuU hierarchy of the critical subsets of a density field 
is presented. It is based on a watershed technique and uses a probability propagation 
scheme to improve the quality of the segmentation by circumventing the discreteness 
of the sampling. It can be applied within spaces of arbitrary dimensions and geometry. 
This recursive segmentation of space yields, for a d-dimensional space, a d—1 succession 
of ri-dimensional subspaces that fully characterize the topology of the density field. 
The final ID manifold of the hierarchy is the fully connected network of the primary 
critical lines of the field : the skeleton. It corresponds to the subset of lines linking 
maxima to saddle points, and provides a definition of the filaments that compose the 
cosmic web as a precise physical object, which makes it possible to compute any of its 
properties such as its length, curvature, connectivity etc... 

When the skeleton extraction is applied to initial conditions of cosmological 
N-body simulations and their present day non linear counterparts, it is shown that 
the time evolution of the cosmic web, as traced by the skeleton, is well accounted 
for by the Zel'dovich approximation. Comparing this skeleton to the initial skeleton 
undergoing the Zel'dovich mapping shows that two effects are competing during the 
formation of the cosmic web: a general dilation of the larger filaments that is captured 
by a simple deformation of the skeleton of the initial conditions on the one hand, and 
the shrinking, fusion and disappearance of the more numerous smaller filaments on 
the other hand. Other applications of the N dimensional skeleton and its peak patch 
hierarchy are discussed. 

Key words: Cosmology: simulations, statistics, observations. Galaxies: formation, 
dynamics. 



1 INTRODUCTION 

The web-like pattern certainly is the most striking feature 
of matter distribution on megaparsecs scale in the Uni - 
verse. The existence of the "cosmic web" (|ZerDovichlll970l ) 
(|Bond fc MversHiggel ) has been confirm ed more than twenty 
years ago by the first CfA catalog (|de Lapparent et al.l 
Il986l) and the more recent catalogs su ch as SPSS 
llAdelman-McCarthv et"alll2008h or 2dFGRS (|Colless et all 
These observations, together with th e dramatic im- 
provement of computer s imulations (e.g. iTevssier et al. I 
(|2008l ') lOcvirk et aLl (120081 )1 have largely improved the pic- 
ture of a Universe formed by an intricate network of voids 
(i.e. globular under-dense regions) embedded in a complex 
filamentary web which nodes are the location of denser halos. 



The traditional way of understanding large scale structures 
(LSS) formation and evolution relies on Friedman equations 
and assumes that LSS are the outcome of the growth of very 
small primordial quantum fluc tua tions by gravit ational in- 
stability (see e.q. lPeebleEl (|l980l ) or lPeeblesI (|l993l ) and refer- 
ences therein). In this theory, the solution for structure for- 
mation is described in terms of a mass distribution that one 
needs to grasp (i.e. by following the evolution of its most im- 
portant features) and compare these to observations. Com- 
prehending the mass distribution as a whole, especially at 
non-Unear stages, is a very difficult task. A possible solution 
therefore consists in extracting and studying simple charac- 
teristic features of matter distribution such as voids, halos 
and filaments as individual physical objects. So far, mainly 
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because of the relatively higher complexity of the filaments, 
most theoretical and computational researches have focused 
on the voids and halos. 

The dark matter halos have arguably been the most 
studied component of the cosmic web. Their density pro- 
files for i nstance are very wel l described by so-called NFW 
profiles (jNavarro et all I1997I) and non-parametr ic models 
are still under investigation (|Merritt et al.|[2006l ). The de- 
pendence of these de n sity profiles on the h alos mass {e.g. 
iBond fc Mw3 (|l996l '). [Lacev fc Cold l| 19931 ') ) has also been 
investigated thoroughly and its relationship with redshift 
and environmental p roperties are a very activ e topics (e.g. 
iHarker et all hOOdi). lAubert fc PichonI hOoSi). I Wang et al.l 



ll200it).lAra.g6n-Calvo et all (|200iD . ISousbie et all (|2008al ') or 
iHahn et alj (|2007l )). From a computational point of view, 
much effort has been put into the development of various 
algorithms to identify halos in simulations and galaxies in 
spectrosco pic redshift galaxy sur veys. The friend-of-friend 
algorithm (jHuchra fc Gelleilll982l ) is now widely spread, as 
well as more c omplex hierarchic al sub-struct ures identifiers 
such as HFOF (|Gottlocbcr"l998l), SUBF IND (|Springel et al.l 
I2OOII). VOBOzTNcvrinck et all |2005| ) or ADAPTAHOP 
(|Aubert etailbOoi T 

Voids are another feature of cosmological matter dis- 
tribution that also have a long history of theoretical and 
computational modeling. The first voids were observed by 
iKirshner et"aLl |l98j) and are in some sense the counter- 
part of halos: the initial quantum perturbations collaps- 
ing into halos at non-linear stages leave room to voids in 
the under-dense re gions. The first theor e tical v oids models 
w here developed bylHoffman fc ShahanJ (| 19821 ). Ilckel (|l984l ') 
or iBertschingeil jWS^ ) among others, while numerical void 
finder s exist, su ch as the one d escribed in lEl-Ad fc Piraiil 
l|l99it ). ZOBOV (|Nevrinckll2008l '). based on Voronoi tesseUa- 
tion, or the recent Watersh ed Void Finder, based on th e 
Watershed transf o rm ( e.g. iBeucher fc Lantueioull (|l979h . 
iBeucher fc Meved (|l993l ')'). by iPlaten et all (|200if ) (see the 
introduction and references therein for a more complete re- 
view of the subject). The improvements in our understand- 
ing of voids and halos properties led to the formulation of 



powe rful theories such as the patches theory (iBond fc Mvers 
19961) the e xtended Press- Schechter theory (e.g. iBond et al. 

I991I) and IShethI (| 19981 )') or the skeleton-tree formalism 

Hanamilliom 



But our investigation of the filaments as individual ob- 
jects is not yet as thorough as for the halos and voids: 
the definition of a well established mathematical framework 
for their study could therefore lead to significant improve- 
ments in our understanding of matt er distribution in th e 
Universe. The first attempts date from lBarrow et al] (|l985h . 
who used a graph-theory construction: the minimal spanning 
tree (MST). This method defines the cosmic web as the net- 
work linking galaxies (or particles from a numerical simula- 
tion), having the property of being loop-free and of minimal 
total length. This technique was later developed in order to 
try quantifying i n an objective way the properties of the cos- 
mic web (see e.o. lGraham et all (|l995h .lColbcrg' f2 007|) and 



a revi ew on the subject can be foun d in Jylartincz fc Saar 



2002 ) ) . The so-cal l ed sha p e finders (ISathvaprakash et al 
19961 ). ISahni et all (|l998l ) or iBharadwai et al" I (l2000l )) al 



networks, uses a marked point process and a simulated an - 
nealing algorithm to trace the filaments (jStoica et ahlbOOsI ) . 
More recently, the skeleton formalism and its local approx- 
imation, that describe the filaments a s particular field lines 
of t he density field , was int roduced bv lNovikov et all (|2006l ) 
and ISousbie et al] (|2008al ) with the advantage of framing 
a well-defined mathematical ground for theoretical predic- 
tions of the filaments properties as well as an efficient nu- 
merical identification algorithm. Finally, an interesting first 
attempt to unify halos, voids and filaments identification us- 
ing the Multiscal e Morphology Filter ( MMF ) technique was 
also proposed bv lAragon-Calvo et al.l (|2007l ). 

In this paper, we introduce a framework and algorithm 
to identify the full hierarchy of critical lines, surfaces, 
volumes... of density distribution in the general case of 
d-dimensional spaces. For 3D space, these critical subspaces 
can be identified to the void and peak patches, as well as 
filaments and other primary critical lines of the distribution. 
The algorithm extracts the filaments as a differentiable 
and, by definition, fully connected networks that traces the 
backbone of the cosmic web. This me thod is closely r elated 
to t he skeleton formalism presented in lNovikov et al.l (|2006l ) 
and ISousbie et al. (l2008ah and i s also b a sed on b o th M orse 
theo r y (see e.g. IColombi et al.1 (|2000l ). iMilnor I (|l963l ) or 
[josti (|l995h ) and an improved Watershed segmentation 
algorithm that uses a probability propagation scheme. 

This paper is organized as follows. In section O we 
present a general definition of the critical sub-spaces that 
we use as well as a method to extract them from sampled 
density field with a sub-pixel precision (focusing more specif- 
ically on the filaments in the 2D and 3D case). In section 
[3] we use this formalism to study the time evolution of the 
cosmic web, and understand the change of its properties 
as a sp ecific object via the truncated Zel'dovich approxi- 
mation (|Zel 'Dovich|[l970l ). Finally, in section |4l we summa- 
rize our findings and discuss a few possible applications to 
N-body simulations and observational spectroscopic galaxy 
surveys. The details of a general simplex minimization algo- 
rithm used in section [5] are presented in Appendix A while 
the general behavior of the inter-skeleton pseudo-distance as 
defined in section [3] is given in appendix B. 



2 METHOD 

The main goal of the algorithm presented here is to allow 
a robust extraction of the non-local primar y critical lines 
amon g w hich the skeleton) as i ntroduced in lNovikov et al] 
20061 ) and lSousbie et all (|2008al ). In these papers, the skele- 
ton was defined as the set of points that can be reached by 
following the gradient of the field, starting from the filament 
type saddle points (i.e. those where only one eigenvalue of 
the Hessian is positive). Let p(x) be the density field, and 
Vp its gradient at position x, the skeleton can be retrieved 
by solving the following differential Equation: 



It 



v = Vp, 



(1) 



low a statistical study of the filaments and another method, 
based on the CANDY model, commonly used to detect road 



using the "filament" type saddle points as initial boundary 
conditions. Because of the difficulty of designing a robust 
algorithm to solve this equation, it was achieved only in 
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2D in iNovikov et alj (|2006l ) and a solution to a l ocal ap - 
proximation in 3D was proposed in ISousbie et alJ (|2008al ). 
This local approximation allowed the extraction of a more 
general set of critical lines linking critical points together, 
the subset of this lines linking saddle points and maxima 
together corresponding to the skeleton (i.e. the "filaments" 
in the large scal e distribu tion of matter in the universe). See 
IPogosvan et al.l (|in prep.h for a discussion of these various 
sets. 

This method works in a very general framework and al- 
lows the extraction of a fully connected non-local skeleton 
as well as an extension of th e pri mary critical l i nes int ro- 
duced in lNovikov et al.1 (|2006l ) and lSousbie et al.l l|2008al ') to 
a hierarchy of critical surface. Following the idea, already 
present in Equation that the topology of a field can be 
expressed in terms of th e properti es its field lines, it takes 
ground in Morse theory l|jostll 19951 ) a nd is roughly based o n 
an extension of the patches theory (|Bond fc MverElll996t ). 
For a sufficiently smooth and non-degenerate field Q 

of dimension d, the peak patches - PP hereafter - 
(resp. void patches - VP hereafter) are defined as the set 
of points from which the field lines solution of Equation 
all converge to the same maximum (resp. minimum) of the 
field. Within this framework, we shall qualitatively show 
that in a d-dimensional space, the skeleton can be thought 
of as the result of d — 1 successive identifications of VPs or, 
equivalently, as the one dimensional interface between at 
least d VPs ( an act ual rigorous demonstration can be found 
m e.g. [jos^ (| 19951 )). Using this definition, extracting the 
skeleton of a distribution thus simply amounts to finding a 
way of robustly and consistently identifying the patches. 

Whether considering a particle distribution obtained 
from a numerical simulation or a density field sampled on 
a grid, the major difficulty arises from the discrete nature 
of the data. In fact, even if the underlying density field is 
supposedly smooth and continuous, the discreteness of the 
sampling implies a relatively large uncertainty on the pre- 
cise location of the patches boundaries, as sampling is lim- 
ited by computational power, which is even more true when 
considering higher dimensions space. The algorithm we use 
is an improved version of the Watershed transform method 
(|Beucher fc Lantueioul|[l979l ). based on a probability prop- 
agation scheme and aims at attributing a probability of be- 
longing to a given patch to every sampled point of the den- 
sity field. This scheme is very general and efficient as it al- 
lows dealing with discrete dataset in a naturally continuous 
fashion and on manifolds of arbitrary dimensions. 



2.1 Probabilistic patches extraction 

The initial idea beyond our patches identification algorithm 
is that a patch can be defined as the set of field lines (i.e. 
curves that follow the gradient of a field) that originate 
from a given minimum (VP) or maximum (PP) of a field. 



^ It would be beyond the scope of this paper to define such a field 
from the mathematical point of view, but it certainly has to be a 
Morse function t hat obeys the Morse- Smale-Floer condition, e.g. 
the discussion in lNovikov et al" 1 1 I2OO6I) . 



Considering a sampled field, being able to identify the 
patches thus amounts to being able to decide, for any given 
pixel p, from which extremum all field lines that cross 
p originate. It is therefore easy to understand that the 
discrete nature of the sampling rapidly plagues such a task: 
for each pixel, considering the measured gradient, one has 
to decide from which, in the fixed number of neighbouring 
pixels, the field line comes from. Within a d-dimensional 
space, having to select between only 3"^ possibly different 
direction for field lines is a crude approximation that leads, 
because of accumulation, to a largely wrong answer for 
pixels located far away from the extrema. 

Although we present the algorithm in the general case 
here, the reader can refer to figure [T] and its legend for a 
simpler and more visual explanation of the algorithm in the 
2D case. More generally, our algorithm involves considering 
each pixel of a sampled field in the order of their increasing 
(resp. decreasing) value, depending on whether we want to 
compute the VPs or PPs and, for each of them, computing 
the probability that it belongs to a given VP {resp. PP). 
This probability map is simply computed by scanning the 
probability distribution of its 3'' — 1 neighbours (within a 
d-dimensional space, here d = 2) and deducing the current 
pixel patch probability distribution from it. Two cases are 
possible: 

(i) none of the neighbours has already been considered 
(i.e their respective densities are all higher -resp. lower- than 
that of the current pixel). This means that the pixel is a local 
minimum (resp. maximum) of the field: a new VP (resp. PP) 
index is created and the probability that the current pixel 
belongs to it is set to 100%. 

(ii) At least one neighbour has already been considered 
(i.e its density is lower -resp. higher- than that of the cur- 
rent pixel) . The current pixel probability distribution is com- 
puted as a gradient weighted average of its lower -resp. 
higher- density neighbours' probability distributions. 

Once all pixels have been visited, a number A'^ of 
patches have thus been created and a list of A*' prob- 
abilities Pt ,k £ {1,..,A}, has been computed for each 
pixel, i. These probabilities quantify the odds that a given 
pixel i belongs to a given patch k. Figure [2] illustrates 
the advantages of our probability list scheme compared 
to the naive approach: without it, the patches borders 
have a strong tendency to be aligned with the sam- 
pling grid and the problem tend to get much worse when 
considering lower sampling and of course higher dimensions. 



Figure 3(c 



presents the results obtained by apply- 
ing this algorithm to the 2D Gaussian random field of 
Figure |3(a)[ On this picture, each patch is assigned a dif- 
ferent shade, and the colour of each pixel is the probability 
weighted average colour of its possible patches. As expected, 
a majority of pixels seems to belong to a definite void patch 
with high probability (close to 100%). In fact, considering 
two neighbouring void patches A and B, all the pixels 
that belong to one of these patches and have a value lower 
than that of the first kind saddle point (s) on their border 
(i.e. where the Hessian only has one positive eigenvalue) 
have a 100% probability of belonging to either A or B. 
Hence, the probabilities of belonging to different patches 
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Figure 1. The different steps of the probabilistic algorithm for finding the patches. The height of the histograms is proportional to the 
density at each pixel of a 2D random field. Top-left: the pixel with lowest density is identified and tagged as belonging to the void-patch 
number 1 (blue colour) with probability P(l) = 1. Top-right: the pixels are then considered in ascending order and arc tagged according 
to the tag of their already visited surrounding pixels. This is repeated until the level of the minimum with second lowest density is 
reached. As this pixel does not have any tagged neighbour, a new void-patch index is added and the pixel is tagged as belonging to 
it (green colour) with probability P(2) = 1. Middle-left and middle-right: the process is repeated until one reaches the saddle point 
with lowest density, located at the border of two patches (middle-left). Above this threshold, a pixel can have several neighbours, each 
tagged with different patch indexes (middle-right). A list of probabilities associated to the different patch index of the neighbouring 
pixels is attributed to the current pixel by computing the density difference weighted average of the respective patches probabilities of 
the surrounding pixels. Bottom-left: repeating the process until all pixels have been visited, one obtains for each pixel a list of possible 
patches index together with their respective probabilities (hence the blurred borders between patches on the picture). Bottom-right: a 
clean border between the patches can be found by defining the index of the patch a pixel belongs to as the one with highest probability. 
It is very straightforward to extend this method to spaces with arbitrary number of dimensions. 
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Figure 2. Illustration of the virtue of the probabilistic algorithm. 
These three curves represent the borders of the void patches ob- 
tained with the probabilistic algorithm, by limiting the maximum 
number of probabilities recorded for each pixel. The black line was 
derived without any limitation, while for the red one two proba- 
bilities were kept and only one for the green one. This last case 
is equivalent to not using any probability list, as only the val- 
ues of neighbouring pixels is taken into account. Also note the 
tendency of the borders to be aligned with the major directions 
of the sampling grid (namely, the sides and diagonals) when not 
taking advantage of the probabilistic algorithm. 



only starts mixing above first kind saddle points. This 



can be seen on the top right zoomed panel of Figure 3(c) 
where probabilities only start blending mildly for densities 
above this threshold (the saddle point are represented by 
the probability "nodes" on the picture). This results in a 
complex distribution of patch index probabilities in the 
vicinity of higher density borders (see upper left panel of 
Figure 3(c) I, and thus a higher uncertainty of the location 



of the void patches border. This uncertainty on the precise 
patch index is directly linked to the location of the skeleton. 
In fact, as explained in the next section, the skeleton can 
also be defined as the set of field lines that do not belong 
to any patch, or in other terms, where sampled pixels have 
an equal probability of belonging to several distinct patches. 



2.2 The d-dimensional skeleton 

As one can easily see, the major strengths of this simple 
patch extraction algorithm are that it is robust and can 
be trivially extended to spaces of any dimensions and 
topology, the only requirement being that one needs to be 
able to define neighbouring relationships between pixels and 
measure distances between them. So we now have a robust 
algorithm for extracting the VP and PP of, in practice, 
nearly arbitrary scalar fields. In this subsection, we show 
that it is possible to generalize the definition of the skeleton 
l|Novikov et al.|[2006l ) to spaces of arbitrary dimension and 
present a simple method to compute the skeleton, as well as 
critical lines and surfaces, based on our patches extraction 
algorithm. 



2.2.1 Definition 

Let us first present important results of the Morse theory 
without de mons t rating them. The more thorough reader 
can refer to lJostI (| 19951 ) for a mathematical demonstration. 

Let us consider the general case of a sufficiently smooth 
and non-degenerate d-dimensional scalar field $d(x), with 
X G Md and Mg a m anifold (i.e R'', the sphere 5"^, ...fl 
Following [jos^ (|l995h . the field lines of $d(x) fill Ma and 
a VP can be defined as the set of points that can be 
reached by following the field lines originating from a given 
minimum of $tj(x). The VPs of $£i(x) thus segment a set 
of d-dimensional volumes that completely fill Aid, each of 
them encompassing exactly one minimum of $(;(x). The 
interface of the VPs, Af^-i, defines a (d — l)-surface (i.e. a 
surface of dimension d — 1 embedded in Md). It is therefore 
possible to apply our probabilistic algorithm to $tj_i(x), 
the restriction of $d(x) to Md-i, in order to extract the 
VPs on this interface. For clarity, we will call the VPs 
of $d-i(x) the first order VPs of 4>d(x), noted 1-VPs 
hereafter. Recursively, the 1-VPs define (d— l)-dimensional 
volumes that pave Md-i, each of them encompassing, 
by definition of a VP, exactly one minimum of <E>tj_i(x), 
with coordinates m £ Md-i C Md, and the reasoning can 
be applied to the whole hierarchy of a- VPs, a G {0, .., d— 1}. 

Starting from a d-dimensional scalar field "I>d(x), it 
is thus possible to define a complete hierarchy of sets of a- 
VPs, a G {0, ..,d— 1}. These a- VPs are (d — a) -dimensional 
volumes that partition Md-a, where Md-a is defined as 
the (d — a) -dimensional interface of the [d — a-\- l)-patches. 
Each set of a- VPs is defined as the set of void patches 
of $tj_Q(x), the restriction of "I>d(x) to Md^a- Let us call 
a critical point, x, of kind n a critical point with Morse 
index (x) = n {i.e. where the Hessian TC (x) has exactly n 
positive eigenvalues). Then, Md-a encompasses the whole 
set of saddle points of kind n ^ d — a, of $tj(x), the minima 
of "l>d-a(x) associated to each a-patch being the saddle 
points of 4'd(x) of kind d — a. The interface Mi is thus 
a curve embedded in Md that links the maxima of $d(x) 
to its saddle points of kind 1: the skeleton of $ti(x). It is 
interesting to note that this approach also allows a rigorous 
definition of the whole set of critical lines similar to the one 
int roduced with t he lo cal approxi mation of the skelet on 
in iNovikov" et al.l (|2006l ) (see also ISousbie et all (|2008al )). 
as well as their extension to critical hyper-surfaces of any 
number of dimensions. 

Although we have only addressed the a- VPs case so 
far, the exact same argumentation holds for the whole hi- 
erarchy of a-PPs, which leads to Md being the skeleton of 
the voids that links minima to saddle points of kind d — 1. 
Moreover, alternating a selection of a„-VPs and Up dp- 
VPs, n„ + Up = d, leads to Md being the curve that links 
saddle points of kind Up to saddle points of kind Up + 1: a 
peculiar set of critical lines of the field. On e can note t hat, as 
rigorously demonstrated in Morse theory (|jostlll995l l. criti- 



^ It will be assumed throughout t his paper t hat the field satisfies 
the Morse-Smale-Floer condition l|jostlll99a) . 
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(c) void patches (d) peak patches 



Figure 3. Figure [3 (a) | represents a 2D density field together with its anti-skeleton (black curve) and skeleton (thick coloured curve). 
The skeleton is coloured according to the value of the index of the underlying void patch, which allows the detection of the saddle points 
(intersection of the skeleton and void patch borders). A skeleton branch starts from a field maximum (large dots) and goes through one 
saddle point before reaching another maximum. Figure p3(b)| represents, for each pixel, the value of the probability that it belongs to 
its most probable patch. By definition, the skeleton is the set of points that do not belong to any patch so the lower this value, the 
more probable the pixel belongs to the skeleton. Figure |3(c)| was obtained by attributing a given random colour to each patch index 
and representing each field with the colour resulting from the probability weighted blend of all patches colours. The zoomed parts show 
patches borders where the uncertainty on the index of the most probable patch index is maximal. The skeleton is represented in white, 
together with its smoothed counterpart (black). Figure [3(d) | represents the peak patches of the same field. 
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cal lines defined in such a way can only link critical points 
whose Morse index only difi^er by unity. 

2.2.2 Implementation 

The representation of the critical lines of a given scalar 
field as a peculiar limit of a peak or void patches hierarchy 
certainly has some mathematical appeal. From a practi- 
cal point of view, although apparently straightforward, 
its direct numerical implementation can nevertheless be 
somewhat problematic. Let G be an initial sampling grid 
and G its reciprocal (i.e. G shifted by half the size of the 
pixels in every direction). Using our patch computation 
algorithm on a scalar field $d(x) sampled over G, we obtain 
for every pixel, i, of G a probability that it belongs to 
a given patch, k. Those sets of probability distributions 
could be used to define a border between the patches and 
thus to compute the 1-PPs and 1-VPs. Nevertheless, this is 
in general not an easy task: one in fact first needs a very 
precise localization of the 1-PPs and 1-VPs (those living 
on the (hyper-)surface of the initial VPs or PPs) to be able 
to compute the following segmentation of the hierarchy (as 
opposed to a density probability). In order to overcome this 
issue, we chose first to base our implementation on a subset 
only of the different patches probabilities and only keep for 
every pixel the index of its most probable patch. This way, 
we are able to simply define the borders between patches 
as the set of pixels of G that overlap at least 2 pixels of 
G with different most probable patch index. The patches 
extraction algorithm can then be applied again over that 
border, restraining pixels examination to the ones that lie 
on its surface. Identifying pixels of G that overlap at least 
2 pixels of G with different most probable patch index, 
one can thus identify the 2-PP or 2- VP and, repeating this 
procedure, all orders of the patches hierarchy. 



For 2D Gaussian random fields, as pictured on fig- 
the skeleton (resp. anti-skeleton) are 



3(d) and 3(c 



identical to the VP (resp. PP) borders and the direct 
implementation of this algorithm leads to a very precise 
and smooth skeleton. But the implementation in spaces of 
higher dimensions raises a critical issue with this simplified 
method, due to the fact that the borders of the a-PPs 
and Q-VPs are only defined by the index of the pixels 
they cross: thus they are jagged and considered locally flat 
(on the scale of one pixel and its direct neighbourhood). 
Figure 4(a) presents the 1-VPs obtained by applying this 
algorithm to a 3D Gaussian random field, each colour 
corresponding to a different 1-VP index. The 1-VPs live 
on the 2D surface which is the border between the cells 
formed by the void patches of the field, each of this cell en- 
compassing exactly one minimum of the field. This surface 
is complex: it can be multiply connected at the interface 
of more than two different void patches and its curvature 
is locally significant. Although neighbouring relationships 
between pixels are easily obtained even where the surface 
is multiply connected, only a rough approximation of the 
actual distances along the surface can be computed, as the 



local curvature is not taken into account. Figure 4(b) shows 



errors in the probability propagation algorithm. This bias 
results in multiple skeleton branches that seem to oscillate 
and cross each other along the true skeleton location. 

In the end, it appears that dropping the full probability 
distribution and approximating borders between patches is 
too coarse an approximation. One solution would involve 
trying to compute the precise location of the a- VPs and 
a-PPs using the full set of probabilities, but, as it will be 
discussed in section [2]3j this raises complex issues. As it is 
the patches interface computation that seems to be difficult, 
the alternative we chose to implement involves computing 
directly the skeleton from the 0— VPs and 0— PPs of the 
field, without having to consider the full hierarchy of a- 



VPs and a- VPs. A close examination of Figure 4(a) led us 
to formulate the conjecture that the (d— l)-VPs or (d— 1)- 
PPs interface corresponds in fact to the subspace of Md-i 
where the manifold is sufficiently multiply connected (i.e. 
where the (d— l)-surface defined by Md-i folds onto itself). 
Equivalently, this locus can be defined in 3-D as the interface 



of at least 3 different PP s or VPs (s ee Figure 4(a) I. This is 
formally demonstrated in |jostl (|l995h . In the general case of 
Md>3, the skeleton should thus be the ID interface between 
at least d VPs or PPs of <l>tj(x). Figure 4(c) presents the 



skeleton obtained using this method on the same Gaussian 



random field as the one used for Figures 4(a) and 4(b) As 
expected, as there is no need to recursively compute the 
full hierarchy of VPs, the resulting skeleton is much more 
precise and well defined. Moreover, a quick comparison to 



Figure 4(b) confirms that it is in fact the approximation of 



the Q-patches interfaces by individual pixels that plagues 
the algorithm, each recursive step exponentially increasing 
the error. 



2.2.3 The skeleton as a set of individual filaments 

The concepts introduced above allow the definition and 
extraction of the skeleton as a fully connected network that 
continuously link maxima and saddle-points of a scalar field 
together. It is certainly of interest to try understanding 
the topological and geometrical properties of this scalar 
field through the connectivity and hierarchy relationship 
that it introduces between the critical points. Applied to 
cosmology, it also allows a formal definition of the concept 
of individual filaments. Considering matter distribution 
on large scales in the Universe, a natural definition of a 
single filament would be a subset of the cosmic web that 
directly links two halos together. The transposition of such 
a definition to the skeleton would allow the introduction of 
useful concepts such as neighbouring relationship between 
halos in the cosmic web sense. It would also make possible 
the study of filaments as individual physical objects, 
similarly to what has been done for years in the literature 
with the halos and voids. 



the corresponding skeleton, computed as the border of the 
1-VPs of Figure |4(a)[ This skeleton is clearly not very well 
defined, the uncertainty in distance computation leading to 



On Figure 3(a) the skeleton (coloured thick network, 
where the colour corresponds to the underlying PP index) 
and anti-skeleton (black network) are superimposed on the 
density field from which they where extracted. Let us define 
a filament as a subset of the skeleton continuously linking 
two maxima together while going through one - and only one 
- first kind saddle point. These saddle points can be easily 
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(a) The 1-VPs of a 3D Gaussian random field 




(b) Recursive algorithm (c) Direct algorithm 



Figure 4. Illustration of the computation of the skeleton as the ID interface of the 1-VPs for a 3D field, <l?3(x). Figure [4(a) [ presents the 
1-VPs of 'I's (x) . The 2D surface, M2 , is computed as the interface of the VPs of <1>3 (x) . The 1-VPs are the void patches of the restriction, 
<E'2(x)! of 3>3(x) to M2. Similarly to picture |3] each colour corresponds to a given 1-VP of <l>2(x), associated to a given minimum of 
<&2(x) (which is also a saddle point of kind 1 of <3?3{x)). The rough appearance of M2 is due to the fact that it is approximated by the 
set of pixels of the sampling grid crossed by the interface of the VPs. The skeleton of Figure [4(b)| is defined as the interface of the 1-VPs 
of Figure |4(a)| its location is not very precise and it seems to oscillate around its "true" location, mainly because only a locally fiat 
approximation of M2 is computed. Conversely, the skeleton of picture [4(c) | is c omputed as the border between at least 3 PPs of 3>3(x) 
or equivalently as the set of points of the surface M2 (pictured on Figure [4(a)[ l which are multiply connected (i.e. where M2 folds onto 
itself). This algorithmically simpler definition leads to a much better defined skeleton. 
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extracted as they are located on the skeleton, at the border 
between the peak (resp. void) patches (i.e where the patch 
index along the skeleton changes, this definition being valid 
for any number of dimensions). This way, all the filaments 
of an N-dimensional distribution can be extracted individu- 
ally by starting from each maximum of the field, following 
all the branches of the skeleton, and storing only the paths 
that cross one saddle point before reaching another maxi- 
mum. This algorithm thus allows the individual extraction 
of filaments as well as a continuous wander of the filamen- 
tary structure of a distribution, which should be very useful 
in a wide range of applications in cosmology. 



i=i j=i 



(2) 



where £]{x) = a; if the j*'' coordinate of pi within p is 1 and 
tj{x) = (1 — a;) if it is 0. Ideally, the skeleton should not 
belong to any VP, so it should be located where all the non 
null values of -P'" (x) are equal. Let us define the arithmetic 
mean of the probability (over the VPs with index k) over 
the pixel 



{P(x)) = l/iV^P'=(x), 



(3) 



and its root mean square. 



2.3 Sub-pixel resolution and skeleton smoothing 

Let us first consider for simplicity a Cartesian sampling grid 
(even though this sub pixel smoothing does not critically 
depend on this geometry, see below). The implementation 
of the procedure of Section [2.21 naturallv leads to a skeleton 
that lives along pixel edges and is thus jagged at the pixels 
scale. The differentiability of the skeleton is nonetheless a 
feature which may be critical for a number of its character- 
istics: its length, curvature, general connectivity ... In order 
to enforce this differentiability, we developed two smoothing 
methods which we use in practice in turn. The first one is 
based on a multi-linear interpolation of the patches probabil- 
ity distribution which Hows naturally from the original algo- 
rithm used to create the skeleton. It provides sub-pixel res- 
olution consistently with the probabilistic framework, thus 
allowing a precise extraction of the skeleton even when the 
sampling is low. The other is used to control the level of 
smoothness away from fixed points (the maxima or the bi- 
furcation points) and can be used to enforce sufficient dif- 
ferentiability. 



2.3.1 Multi-linear sub-pixel skeleton 

Let us first find a way to obtain a sub-pixel resolution on the 
skeleton position based on the patches probability distribu- 
tion of each pixel. The raw skeleton is made of individual 
segments located on the edges of the pixels of a Cartesian 
grid G. Each segment is defined by its two end points, and 
each of them is surrounded by 2'' pixels with a full list of 
possible patches index, together with their respective proba- 
bilities. Recall that the probabilistic algorithm we use works 
on individual pixels so the resulting skeleton position, de- 
fined as the position of the border between several patches, 
is computed with a precision of one pixel. This implies that 
the smoothing procedures may not move the skeleton on 
more than half the size of a pixel. In other words, if we con- 
sider the dual sampling grid, G, of G, the skeleton can be 
freely moved within the pixels of G that its jagged approx- 
imation crosses. So it is sufficient to consider individually 
each of these pixels. Let p be one of these pixels. We then 
know for each of its vertices, pt with i G 1..2'*, the proba- 
bility distribution of the different VPs, P^, where k is the 
index of a VP. In order to obtain sub-pixel resolution, these 
probabilities can be interpolated within p. 

For simplicity, we will only use a multi-linear interpola- 
tion and define P''"(x), the probability distribution of patch 
k, interpolated at point x — {xi, ..,Xd) G [0, 1]'* within p as: 



P(x) 



. ^ (p^x) - {p(x)))^ 
\ k 



(4) 



where the sum is over all the N subscripts k such that there 
exist a pixel pi where VZ 7^ k,Pi > P/. Clearly, all patches 
with dominating probabilities (x) in p are equal when 



P(x) = 0. 



(5) 



Equation ((5)1 is of fourth order and is thus difficult to solve 
in general. 



2.3.2 Approximate quadrics sub-pixel smoothing 

Insights into the solution of Equation ([5} can be found while 
considering the intersection sets of points where pairs of 
probabilities are equal instead of equating them all at the 
same time. These sets are solution of the set of equations 



(6) 



where k and kl are subscripts of the patches that dominate 
on at least one vertex of p. 

For clarity, let us consider the d = 2 case first. With a 
proper indexing of the four pixels pi, 

P'=(X) = P^{1~X^){1-X2)+P^{X^){1-X2) 

+Pi{l-Xi){x2)+P^{xi)ix2), (7) 

Equation (|6} writes in this case: 



A XiX2 + Bxi-\-Cx2 + D^0, 



(8) 



where A, B, C and D only depend on the values of P/°. 
Equation ((Sjl is quadratic and its solutions are well known 
curves of dimension d — 1 = 1 called quadrics. Figure [5] il- 
lustrates solutions of Equation ^ when p is located at the 
intersection of Np = 2, Np = 3 or Np = 4 different VPs. 
In the most frequent configuration where p is at the inter- 
section of 2 VPs, Equation ^ directly gives the first order 
approximation of the intersection of the skeleton and p, and 
we may approximate it by a straight segment. Finding the 
end points of this segment is easily achieved by computing 
the location of equal probability along the two sides of p 



that link vertices with different patches (Figure 5(a) I. The 
Np = 3 configuration is rarer, and concerns only the maxima 
of the field as well as all bifurcation points of the skeleton. In 
this case, we know that three different branches of the skele- 
ton merge within the pixel, at a point where all probabilities 
are equal. So, we may set the bifurcation point as the locus 
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(a) Intersection of 2 patehes 



(b) Intersection of 3 patches 



(c) Intersection of 4 patches 



Figure 5. Illustration of the computation of the sub-pixel skeleton in the case of a 2D bi-linearly interpolated pixel located at the 
border of 2, 3 and 4 different patches. The colour of the pixels vertices represent the index of the dominant patch, while the two-coloured 
dotted lines are the quadrics, solutions of Equation l[6j. These lines are the regions where the probabilities corresponding to the patches 
with similar colours are equal. The underlying blue gradient corresponds to the value of -P(x) (Equation Q), light colours encoding 
lower value. Finally, the black lines represent our approximation of the smoothed intersection of the skeleton with the pixel. 



where all the C| = 3 quadrics of Equation ((Gjl intersect (note 
that the three of them always intersect in a single point as 
P^(x) = p2(x) and P^(x) = P^(x) implies P^{x.) = P^(x)). 
The three branches of the skeleton in p are thus obtained 
by linking the bifurcation point to the iso-probability along 
the three sides of p that link vertices associated to differ- 
ent patches (Figure 5(b) I. Finally, the Np = 4 configuration 
is very rar4j and also more problematic. As previously, we 
know that there exist a bifurcation point within p, but this 
time with 4 different skeleton branches. Since there are now 
6*2 = 6 different Equations (jS)), and given that the solution 
of each of them is a ID quadric, this system is clearly over 
constrained to find the precise location of the bifurcation 
point. A solution may well be to use a higher order inter- 
polation, allowing more complex curves than quadrics for 
equal probabilities regions, or to try solving directly Equa- 
tion ((S)). As this case is clearly rare, it would also be possible 
to approximate the bifurcation point as the barycenter of the 
three points of intersection of the subsets of Equations ([6]) 
taken in pairs. Again, the smoothed skeleton would there- 
fore be derived by linking the bifurcation point to the four 



iso-probability points along the four sides of p (Figure 5(c) I 



2.3.3 Actual recursive implementation 

Having discussed the underlying geometry of the sub-pixel 
multi linear interpolation, let us now turn to our actual 
sub-pixel smoothing algorithm. Indeed, in d-dimensions, 
Equation (|6]) is of order d and is linear in each of the 
d space coordinates Xi. Its solutions are thus d — 1 di- 
mensional quadrics whose intersections, as in 2D, can be 
used to recover the skeleton position down to a sub-pixel 



note that the scarcity of these points is directly related to res- 
olution, i.e. whether or not the skeleton is featureless at the sub- 
pixel scale. Hence these points may occur more often in higher 
dimensions, which for computational reasons may be relatively 
under-sampled. 



precision. Finding intersections of quadrics in general 
remains nonetheless a highly difficult (or even untractable!) 
problem and even state-of-the-art solvers can only achieve 
such a performance for d = 3 at most. To circumvent this 
difficulty, we thus opted in practice for a different solution 
that consists in a recursive numerical minimization of the 
value of P(x) over the hierarchy of n-cubes (i.e hypercubes 
of dimension n), n £ {l,..,d}, that are the faces of each 
cell of the sampling grid. The trick is to always reduce the 
problem to a ID minimization of a polynomial of order d 
(see appendix A). Figure |B] illustrates the full process in 
3D. Let us consider the grid cell of Figure 6(a) located at 



the interface of 4 different patches. The skeleton extraction 
algorithm produces the jagged skeleton represented in red. 
In order to improve its resolution, we first consider each of 



the 12 edges individually (see Figure 6(b) I and determine 
for each of them the point of equal probability for the two 
patches that dominate at the end points of segment. Of 
course this point only exists if different patches dominate 
at the end points of a segment and we thus obtain at most 
12 new points (7 in this instance, represented by the red 
crosses). The edges of a cube can be considered as its "one 
dimensional faces" or 1-faces. The following step consists 
in examining the configuration of its 2-faces, usually called 
faces for a 3D cube. Figure 6(c) illustrates the configuration 
of these 6 faces together with the iso-probability points 
computed over their edges. We know that at least 3 different 
patches have to dominate on at least one of the 4 vertices 
of each face for a skeleton branch to enter the cell through 
this face. Using the minimization algorithm presented in 
Appendix A and the iso-probability points on the edges, it 
is thus possible to compute, over these faces, the location 
of the minimum of P(x) (represented as blue crosses on 
Figure 6(c) I. Finally, considering the 3-face of the cell (i.e. 
the cube itself), one can determine the point of minimal 
value of P(x) over the cube, which is the point where the 



skeleton branches connect (see figure 6(d) I 



The generalization of this algorithm is relatively 
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(a) 



(b) 



(c) 



(d) 



Figure 6. Illustration of different steps of the recursive algorithm used to obtain sub-pixel resolution for the skeleton, in 3D. The colour 
of the balls represent the index of the patch with maximal probability while the intersection of the skeleton and the cell is displayed in 
red. The algorithm consists in recursively considering the n-dimensional faces of the sampling unit volume (here an hypcrcube). For a 
3D Cartesian sampling grid, one starts equating dominant probabilities on the vertices of edges, then faces and finally the cube. 





Figure 7. Illustration of the skeleton with a sub-pixel resolution 
in 2D. The background pixels colour represent the sampled den- 
sity field while the black skeleton was obtained using our proba- 
bilistic algorithm. The purple skeleton is the post-treated version 
of the black one. Note how any sampling grid influence disap- 
peared, especially in the originally vertical segment located in 
the upper-left corner of the image. 



Figure 8. A failure of the skeleton sub-pixel algorithm due to the 
lack of sampling resolution. The dotted grid represents the recip- 
rocal sampling grid, G, while the pixels colour represents their 
dominating patches and the initial raw skeleton is represented in 
red. The green skeleton is the result of applying the sub-pixel res- 
olution algorithm while the blue one was obtained from the green 
one, after removing one pixel sized loops and smoothing. 



straightforward. Let us again consider a cell that is a hy- 
percube of dimension d. We know that the skeleton inter- 
sects this cell if at least d of its vertices have different max- 
imal probability patches index. In that case, the sub-pixel 
resolved skeleton can be recovered by considering all the n- 
faces of the hypercube, n£{l,..,d— 1}, in ascending order 
of their dimension n. When considering a p-face, we mini- 
mize the value of P(x) in order to obtain the point where its 
vertices respective patches have equal probability, using the 
points obtained from the {p— l)-faces. This point only exist 
for a p-face if at least (p + 1) vertices most probably belong 
to different patches. In the end, one thus obtains a number 
of points from the (d— l)-faces that are the points where the 
skeleton enters the cell and one point for the d-face (i.e. the 
cell itself), which gives the location where different branches 
of the skeleton connect. Figure [7] illustrates the result of ap- 
plying this algorithm in the 2D case. 



2.3.4 Artifacts correction and differentiability 

Though the method presented above to obtain sub-pixel 
resolution works most of the time, there nonetheless exist 
situations where it can fail due to sampling effects. Figure [8] 
illustrates such a situation, which can sometimes occur 
when the sampling grid pixel size is not totally negligible 
compared to the average extension of the patches. When 
the thickness of a peak or void patch is smaller than a pixel 
size, it can in fact lead to mistakingly isolated subregions of 
size one pixel, implying the creation of spurious loops in the 
skeleton (in red). This phenomenon, although rare, occurs 
in spaces of arbitrary dimension and triggers artifacts when 
applying our sub-pixel resolution algorithm. The green 
skeleton on Figure |8] presents such an example of a spurious 
skeleton loop. 

In order to fix these anomalous segments, we chose to 
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post-treat the skeletons by opening-up all one-pixel sized 
loops and smooth the resulting skeleton to enforce a desired 
level of differentiability in the skeleton trajectory (see the 
blue skeleton of Figure [8]) . The smoothing method that we 
use presents the advantage of being quite robust, and in- 
volve fixing some specific points of the skeleton, and averag- 
ing the position of each non-fixed segments end points with 
the position of its closest neighbouring end points a number 
of times. Let be the j**" coordinates of the i*** sampled 
skeleton location (among A'^) between two fixed points. Be- 
fore smoothing, all a;* are located on the edges of G and we 
can define their smoothed counterparts as j/], computed as: 



with 

3/4 
1/2 
1/4 



i = j, 
i = j + l 



i=j = N, 
« = J - 1, 



(9) 



(10) 



elsewhere, 



where Equation ^ is applied s times in order to smooth 
over s segments. Basically, Equation (|9]) is used to compute 
smoothed coordinates j/* as a weighted average of the orig- 
inal coordinates Xj together with the coordinates of its 2 
direct neighbours, Xj~^ and s*"*"^. Applying this scheme s 
times thus produces the final smoothed coordinates y] to be 
a weighted average of x'^ and its s closest neighbours along 
the skeleton. 

This smoothing technique introduces two parameters 
of importance: the skeleton smoothing length s, and the 
type of fixed points. In order to determine the optimal 
value of s, it is possible to minimize the reduced cor- 
responding to the discrepancy between and supple- 
mented by a penalty corresponding to the total length of 
the skeleton (over-smoothing will increases the discrepancy, 
under-smoothing will increase the total length). In practice, 
though, as a post treatment to an already smooth skeleton 
(using the sub pixel probabilities) , this choice is not critical. 

The choice of the skeleton points that should be fixed 
before smoothing depends of the planned application; in 
practice, we implemented two possibilities: (i) fixing the field 
extrema, or (ii) the bifurcation points of the skeleton (i.e the 
points of the skeleton where two filaments merge into one). 
Figure |§] illustrates the infiuence of this choice on the shape 
of the smoothed skeleton. By fixing the extrema of the field, 
one ensures that the skeleton subsets that link these ex- 
trema are treated independently: this is the solution used 
to study the properties of individual filaments in the dark 
matter distribution on cosmological scales. One should note 
that, in this case, the parts of the skeleton that belong to 
several individual filaments are duplicated (see the red skele- 
ton on Figure [9|, affecting global properties of the skeleton 
such as its total skeleton length. In contrast, fixing bifur- 
cation points enforces the smoothing of the skeleton while 
conserving its global properties. 



2.4 Summary 

Let us finally recap the main steps involved in the extraction 
of the fully connected skeleton in a d-dimensional space. 

(i) The density field is sampled and smoothed in order 




Figure 9. Influence of the choice of fixed points on the shape 
of the smoothed skeleton. The original skeleton is represented 
in green, while the red and blue skeletons are smoothed s = 6 
times, while fixing the field maxima and the bifurcation (i.e mul- 
tiply connected) points respectively. In both cases, the smoothed 
versions always stay within half the size of a pixel distance 
from the original non-smoothed skeleton. On this illustration, the 
smoothed skeleton was computed directly from its raw jagged ver- 
sion to emphasize the eff'ect of the choice of different fixed points. 
This discrepancy between the two options is considerably weak- 
ened if the skeleton is previously post-treated. The background 
colour corresponds to the weighted probability each pixel has to 
belong to a definite patch. 



to ensure sufficient differentiability. A smoothing scale of at 
least 5 pixels is recommended when using a Cartesian grid. 

(ii) All pixels are considered in the order of their ascend- 
ing (or descending) density. Depending on their neighbours, 
they are labelled as minima (or maxima), or assigned a list 
of probability to belong to a given VP (or PP) following the 
algorithm of section [2Tl1 

(iii) Considering only the patch index with highest prob- 
ability for each pixel, skeleton segments are created on pixel 
edges when at least d surrounding pixels among 2'' have a 
different most probable patch index. 

(iv) Calling a vertex connected to more than two seg- 
ments a node of the skeleton and considering each node, the 
sets of connected segments that link them to other nodes 
are recorded in order to later recover the information on the 
skeleton connectivity (and allow a continuous wander along 
the fully connected skeleton). 

(v) The sub-pixel smoothing procedure of Sec. 12.3.31 is 
implemented. All the vertices of the skeleton segments are 
considered one by one together with the value of the prob- 
ability distribution in the center of the surrounding pixels. 
According to the sub-pixel algorithm, the extremities are 
moved in order to obtain a differentiable skeleton. 

(vi) Configurations that are identified as problematic are 
corrected for following the method described in section [2.3.4l 
and the resulting skeleton is smoothed over a few pixels (usu- 
ally d of them) while fixing either bifurcation points or max- 
ima/minima. 

(vii) Eventually, individual filaments can be extracted 
(and tagged) following the method of section [2.2.31 
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Figure 10. the 3D peakpatch colour coded probability function: warm colours correspond to equiprobability regions, dark colours to 
regions where one probability dominates (see also Figure [3(b)[ l. This supplementary map complements the peakpatch map in the present 
algorithm and allow for a precise sub pixel segmentation and skeleton extraction. Note the extended equiprobability sheets corresponding 
to places where the exact position of the filament will be more uncertain. 



Figures [TT] and [TJ] show a 3D skeleton computed from a 
simulated density field at 2 = 0, sampled over only 128'' 
pixels. 



Note that in this paper, we did not address the issue 
of shot noise that has for long been known to be a problem 
for most segmentation algorithm s, and in partic ular for 
Watershed techniques (see e.g. iRoerdin^ ([l99^ for a 
review on the subject). In fact, shot noise often leads to 
over segmentation, and numerous complex techniques have 
been developed to try and compensate for it. Instead, we 
chose h ere to follow t he app roach used in Novikov et al.l 
(|2006D . ISousbie et all (|2008al ) and ISousbie et all (|2008bt ). 
that involve simply filtering the sampled fields using a 
Gaussian kernel on large enough scales (in terms of number 
of sampled pixels) so that it is possible to consider that 
the sampled field is a smooth enough representation of the 
underlying field. A clear disadvantage of this method is that 
it introduces a particular smoothing scale and thus adds 
one parameter (the smoothing scale) to take into account 
when considering sets of critical lines an surfaces computed 
on a field. A short study of the robustness of the skeleton 
extraction in the case of a smoothed scalar field is presented 
in appendix C. Improvements over this shortcoming are 
postponed to further investigations. 



Regarding performance, the computing time and mem- 
ory consumption for the extraction of the skeleton mainly 
depends on three parameters: the number of pixels Np, the 
smoothing length L in units of pixel size and the number of 
dimensions A''^. Most of the computational power is spent 
during the first step of the process: the propagation of the 
probabilities to compute the patches. For a constant value 
of L and Nd, the algorithm speed is linear in Np, and so 
is the memory consumption. A smaller value of L implies 
more smaller patches, which therefore have proportionally 
more borders with each other thus increasing the number of 
different probabilities to propagate. Indeed, for very small 
values of L, memory consumption is largely increased as 
well as the computational time; it seems reasonable to keep 
L above a minimal threshold of L J5 5 pixels (which in any 
case is also necessary to ensure sufficient differentiability of 
the sampled field). Finally, the value of Nd is most critical 
to memory consumption and speed, not only because Np 
should increase with Nd to keep a constant sampling reso- 
lution, but also because the number of neighbours for each 
pixels scales as 3^'' for a Cartesian grid. The computational 
time and the memory consumption follows, as the number 
of different probabilities to keep track of is also much in- 
creased (each pixels having many more neighbours, the ratio 
of patches interface surface to their volume increases and so 
does the number of different probabilities to propagate, on 
larger distances). For the different skeletons presented in this 
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Figure 12. The 3D skeleton of the simulation of the cosmological density density field in a 50h^^ Mpc box with gadget-2 (see also 
Figure UTi . This skeleton was computed from a 128'^ pixels sampling grid smoothed over 5 pixels (fa 2h~^ Mpc). The skeleton colour 
represents the index of the peak patch (which provide by construction the natural segmentation of filaments attached to the different 
clusters.). Movies of 3D skeletons can be downloaded at | http:/ / www2. iap.fr/users /sousbie/ 



paper, to give and order of magnitude, for a single modern 
CPU, 2D skeletons of 1024^ pixels smoothed over I « 10 pix- 
els are computed in a matter of few seconds and the memory 
needed is of order « 10 MBytes. Computing a 3-D skeleton 
on a 128^ pixels grid with L ~ 6 takes approximately 1 
minute and a hundred of MegaBytes of memory, while for a 
512^ grid, it takes about an hour and around 14 GBytes of 
memory are used. While 473 skeletons are still tractable at 
a descent resolution , higher dimensionality seems difficult 
to reach with present facilities without implementing a fully 
parallel version of the code. 



3 AN APPLICATION: VALIDATING THE 
ZEL'DOVICH MAPPING 

The scope of application of the algorithm presented in Sec. [2] 
is vast (see Sec. |4] for a discussion). Here we shall focus on a 
simple example which makes use of one of the clear virtues of 
the above implementation: it allows us to identify as physical 
objects the filaments present in the matter distribution on 
cosmological scales, and see how these objects evolve with 
time. 

Specifically, we intend to show, using the skeleton as a 
diagnostic tool, that a relatively simple but powerful model. 



namely the trun cated Zel'dovich approximation mapping 
(|ZerDovichlll97Gl ). can capture the main features of the cos- 
mic evolution of the web. Indeed predicting the evolution of 
matter distribution from the point of view of the topology 
and the geometry of th e cosmic web has been a recurrent 
issue in cosmology (e.g. iBond fc MversI (|l996l )) and is be- 
coming critical as the geometry of the cosmic environment 
is n ow believed to play a key role in shaping galaxies (see, 
e.g. lOcvirk et ahl (|2008h ). 

Being able to carry such an extrapolation from the ini- 
tial condition to the present day distribution of filaments 
should lead to a simplified and broader understanding of 
large scale structures in the Universe, in the same way the 
concept of clusters as important physical objects gave birth 
to the hierarchical model of structure formation. The fully 
connected skeleton encompasses both the geometry and the 
topology of the cosmic web: it is therefore the ideal tool 
to validate this mapping between the initial condition and 
the present day distribution of filaments. Understanding and 
partially correcting for the distorsion induced by the proper 
motions of the structures is also of prime imp ortance when 
dealing with observationnal data sets (see e.g. iPichon et al.l 
42003)). 

The principle of the Zel'dovich approximation (ZA here- 
after) is to make a first order approximation, in Lagrangian 
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Figure 11. The 2D projection of a 3D skeleton computed on 
a simulation of the cosmological density density field in a 50h~^ 
Mpc box with gadget-2. This 20h~^ Mpc thick section of skele- 
ton was computed from a 128"^ pixels sampling grid smoothed 
over 5 pixels 2h~^ Mpc). The skeleton colour represents the 
index of the peak patch. Note that the 2D projection of a 3D 
skeleton differs from the skeleton of the 2D projection, hence the 
discrepancy between the skeleton and apparent filaments. 



coordinates, of the motion of the collisionless dark matter 
(DM) particles. The motion of these particles from the ini- 
tial mass distribution in Lagrangian coordinates q to their 
Eulerian coordinates x can therefore be described as; 



x(z,t) = q-fD(z) /D (zO*.(q), 



(11) 



where z is the redshift, D (z) the growth factor, and ^'i (q) 
the displacement field, computed in the initial matter dis- 
tribution as: 



*.(q) 



-2D 



(12) 



where H is the Hubble constant, Q the quantity of energy 
in the Universe, (j) the gravitational potential and the 
subscript "in" stands for initial conditions. The truncated 
Zel'dovich approximation simply consists in filtering short 
scale modes of the initial power spectrum before computing 
the displacement field in order to prevent shell crossing 
effects. It has bee n shown to impr ove the precision of 
the approximation jColes et al.l Il993l ). As we are mainly 
interested in the large scale behavior of the cosmic web, 
the smoothing scale, L = Lnl ~ 3.94 Mpc, that we use 
hereafter to compute ^'i roughly corresponds to the scale of 
non linearity at 2; = 0, as the so called truncated Zel'dovich 
appro ximation has been shown to work best above this 
scale (jKofman et al.lll992l '). It was computed as the scale at 
which, in the simulation, the smoothed density field, p{L), 
is such that a^(I/NL) = ((p(I/nl) — p(I/nl))^) — 1 at z — 0. 



3.1 Simulation and skeletons 

The numerical simulation that we use in this section was 
comput ed with the pu blicly available N-body code GAD- 
GET2 (|Springe]| 120051 ). It corresponds to a dark mat- 
ter only cosmological simulation of 512'^ particles within 
a 250/i~^ Mpc cubic box, considering a ACDM concor- 
dant model {Ho = 70, fib = 0.05, as = 0.92, Qa = 
0.7 & Slo = 0.3). In order to study the evolution of 
the cosmic web, a set of reference skeletons, Ssimuiz, L), 
were computed from different snapshots, at redshift z = 
{0,0.15,0.3,0.5,0.66,1.15,3,5,10}, where z = Zi = 10 cor- 
responds to the redshift of the initial conditions of the sim- 
ulation. These skeletons were computed on density fields 
generated by sampling the particle distribution of the re- 
spective snapshots on a 512'^ grid and after smoothing with 
a Gaussian kernel of size L — Lnl ~ 3.94. In order to under- 
stand if the truncated Zel'dovich approximation is able to 
capture the essential features of cosmic web, these skeleton 
are compared to difi'erent sets of skeletons, generated using 
the truncated Zel'dovich approximation in different ways: 

• Sza{z, Ltih)'- This set of skeletons is generated by 
applying the Zel'dovich approximation to the DM particles 
of the simulation in the initial conditions. The displacement 
field is computed after smoothing over the scale Lnl and 
the resulting distribution is sampled and smoothed over the 
same scale to generate the skeletons. 

• Ssza{z, L-m-l): these skeletons are generated by apply- 
ing the Zel'dovich approximation directly to the skeleton 
of the initial conditions. The initial condition simulation 
{zi = 10) is sampled and smoothed over the scale Lnl to 
compute its skeleton. The displacement field is computed 
on the same field, but smoothed over a scale Li « 8.81 
Mpc (such that a^{Li) = 0.5 at z — 0) and the Zel'dovich 
approximation is applied to each segment of the initial 
condition skeleton. We use a larger truncation scale 
for the Zel'dovich approximation here in order to pre- 
vent shell crossing, which can be tolerated when applied to 
particles but would result in a very fuzzy displaced skeleton. 

• 5zal(2:, ^nl): same as Sza{z , Lt^-^) , but with a dis- 
placement field smoothed over the scale Li, in order to 
check the influence of this choice on Ssza{z, I/nl). 

• Sza{z, Lcot)'- same as Sza{z, Lt^h), but the sampled 
field is smoothed on a scale Lcor instead of Lnl to take 
into account that the Zel'dovich approximation introduces 
an artificial additional smoothing scale (see below). 



3.2 Skeleton length 

There exist many different ways to compare one dimensional 
sets of lines within a 3D space, but one of the simplest cer- 
tainly involves comparing their lengths. Figure [13] presents 
the measured length per unit volume of the different sets 
of skeletons (described above) as a function of redshift. Let 
us first consider the length of >5Bimu (z, ^nl) (purp l e curve 
wit h discs symbols). It wa s shown in lSousbie et al.l (|2008ah 
and ISousbie et al] (|2008bl ) that, whereas for scale invariant 
fields such as the initial conditions of the simulation, the 
length of the skeleton is expected to grow as L"^ {L being 
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Figure 13. Measured length of the skeleton per unit volume as 
a function of redshift z. The length density was measured on the 
simulation {purple discs), its truncated Zel'dovich approximation 
whose displacement field was computed using a smoothing length 
L Ki 3.94 such that cr{L,z = 0) = 1 (red squares), or L; 8.81 
such that cr{L, z = 0) = 0.5 (green crosses), and finally using the 
displacement field of the Zel'dovich approximation at scale L;, 
applied directly to the skeleton of the initial condition, at ^ = 10 
{blue triangles). The black dashed line stands for the length of the 
skeleton in the initial conditions (at z = 10), while the dotted line 
represents the length measured using the Zel'dovich approxima- 
tion on the initial condition while taking into account the effec- 
tive smoothing introduced by using the Zel'dovich approximation. 
This recipe yields the best match with the simulation. Except for 
this last case, the skeletons where computed after smoothing the 
density field with a Gaussian kernel of width L. 



the smoothing length), it grows in fact as ~ L^^'^^ around 
z = for ACDM simulation. Note that the fact that the 
length of 5simu(z, ^/nl) decreases with time seems consis- 
tent with the expected evolution of matter distribution in 
the case a cosmological constant, where the expansion ac- 
celerates around z « 1. In that case, matter in fact tends 
to form separate distant heavy halos; more numerous small 
filaments on smaller scales shrink and melt into each other 
as dark matter halos merge, while larger scale filaments tend 
to stretch: the net result is a total length decrease. 

This process seems to be well captured by the 
Zel'dovich approximation as the length of 5za(z, Lnl) {red 
curve, square markers) exhibits the same time evolution 
as the length of Saimuiz, L). The discrepancy between the 
measured length in the simulation and with Zel'dovich's 
approximation is nonetheless of the order of ~ 10% at z = 0. 
This disagreement should be explained in part by the fact 
that the Zel'dovich approximation uses a displacement field 
computed from a smoothed version of the initial condition 
density field, thus introducing an additional smoothing that 




Figure 14. Ratio of the length of the skeleton measured in 
the simulation to the length of the skeleton of the Zel'dovich 
approximation as a function of time, a. The dashed line represents 
the best fit of the data (red squares). 



one should take into account when computing Sza- 

The measure of the ratio, r, of the length of 
5simu(2, ^/nl) to the length of iSza(2,-Lnl) as a function of 
time, a, is displayed on Figure 1141 It appears that r is ap- 
proximately a linear function of time, a, and can thus be 
fitted as 



r = 1.00 -f 0.14 {a -at), 



(13) 



where a; — 1/ {1 + Zi) ~ 0.09 is the time of the initial con- 
ditions from which the Zel'dovich approximation was com- 
puted. Moreover, the fact that the value of r is relatively 
close to unity confirms that the artificial smoothing intro- 
duced by the Zel'dovich approximation is small; we chose 
to model it as a convolution with a Gaussian kernel of size 
LzA. The effective Gaussian smoothing used on Zel'dovich's 
approximation has scale L^s and is thus the result of the 
composition of two Gaussian smoothing of scale Lza and 



+ L^ 



(14) 



Using equations (|13fl and (|14p . and the fact that the 
skeleton length grow s with smoothing scale as ~ L~^'^^ 
ISousbie et aLll2008bh . the value of Lza one should chose 
to get the best match with ACDM simulations is thus 



2-0.14 
1.75 



1/2 



0.4iNL\/a~~a7. (15) 



In order to compute a skeleton that is comparable to 
5simu(2, Lnl), one should therefore smooth the distribution 
obtained using the Zel'dovich approximation on a scale Lcor 
such that 



Lcor — vLl\ 



Lz 



= LnlV1.00-0.16 {a -at). (16) 



On Figure [13] the dotted black curve represents the measure 
of the length of Sza{z, Lcoi), when the effective smoothing 
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introduced by the Zel'dovich approximation is taken into 
account. The agreement with the measurements in the sim- 
ulation is significantly improved compared to the naive ap- 
proach; this suggest that the Zel'dovich approximation can 
be used to predict the shape of the evolved cosmic web from 
the initial conditions distribution only. 

Of course, the length is only a global characteristic 
of the skeleton and it certainly does not fully constraint 
its shape. Higher order estimators that can compare the 
relative position and shapes of skeletons are needed to 
quantify how good an approximation the skeleton obtained 
by Zel'dovich's approximation is. 

Before doing so, let us consider an alternative form of 
the Zel'dovich approximation, where, instead of displacing 
the particles from the initial conditions of the simulation 
to derive the evolved density field, we directly use the dis- 
placement field to evolve the skeleton of the initial condi- 
tions. This method will be called here the skeleton Zel'dovich 
approximation (SZA hereafter), and the resulting skeleton 
5szA. Studying the properties of 5sza is interesting as it 
should make it possible to distinguish between two differ- 
ent processes that affect the properties of the cosmic web: 
the simple deformation of the initial cosmic web on the 
one hand and the creation or annihilation of filaments on 
the other hand. Indeed, 5sza reflects only the modiflca- 
tion of the skeleton due to its deformation while 5za also 
takes into account the merging and annihilation of filaments. 
Note nonetheless that by definition, the locus of the skele- 
ton for the SZA is biased toward higher density regions; in 
these regions, non-linear effects inducing shell-crossings in 
the Zel'dovich approximation are more likely. To be con- 
servative, we thus use a larger smoothing length than Lnl 
to compute the displacement field. This smoothing length, 
Li ~ S.Slh^^ Mpc, was chosen such that a{Li, z = 0) = 0.5; 
the green curve {cross markers) of Figure [T^ shows that us- 
ing Li or Z/NL does not make any difference regarding the 
length of the skeleton. On this figure, the blue curve {triangle 
markers) depicts the evolution of the length of Ssza{z, -Lnl): 
its behavior is clearly opposite to the 5za case, as the length 
rises with time. Although surprising at first sight, this result 
only confirms our previous interpretation of the evolution of 
the cosmic web. In fact, if the SZA can nicely capture the 
large scale evolution of long filaments, the smaller ones can- 
not melt into each other, which induces several small scale 
filaments to be located at the same loci, where only one piece 
of filaments should have been measured. The disappearance 
of the smaller scale filaments does not compensate anymore 
for the expansion of large scale filament: the net result is 
thus an increase of the total measured length of 5sza with 
time. 



3.3 Inter-skeleton pseudo-distance 

Let us now define a way to compute a pseudo-distance b e- 
tween two different skeletons (see also lCaucci et al.ll|2008h ). 
In practice, a skeleton S is always computed from a sampled 
density and thus has a maximal resolution Rs . It can there- 
fore be described, without loss of information, as the union 
of a set of A'' straight segments 5' of size -Rs. We define a 
pseudo-distance from a skeleton Sa to a skeleton Sb, 'D{a, b), 
as the probability distribution function (PDF) of the min- 



imal distance from the A" segments 5^ to any of the A*" 
segments S^. In practice, our algorithm applied to a density 
field sampled on a Cartesian grid naturally leads to a skele- 
ton described as a set of segments of size the order of the 
sampling resolution. Hence we directly use these segments 
to compute inter-skeleton distances. 

Note that there is no reason, in general, for T>{a., h) 
to be identical to 2?(b, a); this discrepancy, together with 
the value of the different modes of the PDFs, do in fact 
quantify the differences between Sa and Sb (see appendix 
B for details on how to interpret pseudo-distances PDFs). 
The upper and lower panels of Figure [15] present the 
pseudo distance measurements obtained by comparing 
5simu to <SzA and 5sza respectively. A close examination of 
Figure |15(a)| confirms the hypothesis we made in previous 
subsection. First, the high correlation of 5za and 5aimu 
{bold curves) for any redshift, is demonstrated by the 
localization of the mode around d « 600/i~^ kpc, well 
below the smoothing length Lnl = 3.94 Mpc. Second, 
the asymmetry between the PDFs of ©(ZA, simu) and 
D(simu, ZA) follows from the fact that 5simu has smaller 
scale filaments that have no counterpart in 5za (the mode 
intensity is higher for D(ZA, simu) than ©(simu, ZA)). 
This is exactly what should happen if <Sza was effectively 
smoothed on a scale larger than 5simu. The thin curves, 
for which the effective Zel'dovich approximation smoothing 
was taken into account, confirms this, as the asymmetry is 
completely removed in that case. 

It is also interesting to look at the distance PDFs 
between 5sza and 5aimu (see Figure 15(b) I. Except for high 
redshifts {z = 5), the general intensity of the modes are 
lower for ©(SZA, simu) than for I'(ZA,simu), suggesting 
that the Zel'dovich approximation is a better description 
of the evolution of the filaments on large scales, and that 
filaments mergers and creation are important processes. 
The general position of the modes is still comparable, which 
means that SZA is nonetheless successful in describing the 
evolution of the general shape of the cosmic web. Also, 
the asymmetry between D(SZA, simu) and Z)(simu, SZA) 
suggests that 5sza has more small scale filaments than 
5simu. These observations confirm our previous assumption 
that although the cosmic web evolves in a simple inertial 
way on larger scales (a process captured by SZA), the 
shrinking and fusion of the more numerous smaller scale 
filaments is the cause of the general length decrease of the 
cosmic web (as suggested by a simple visual examination 
of a (50/i"^)^ Mpc^ subregion of 5simu, <Ssza and 5za on 
Figure [in}. 

The above investigation opens the prospect of correct- 
ing for the peculiar velocities of galaxies induced by gravita- 
tional clustering, and carr y an Alcock-Paczynski (AP) test 
(|Alcock fc PaczvnskilllQTgl ) on the skeleton of the large scale 
structures of the universe. In short, the AP test compares 
observed transverse and longitudinal distances to constrain 
the global geometry of the universe. Galaxy positions are 
usually observed in redshift space which induces an impor- 
tant distortion between the distances measured along and 
orthogonally to the line of sight, which plagues the regular 
AP-test. Our analysis suggests that it is in fact possible to 
correct through the Zel'dovich approximation for the dis- 
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Figure 15. The inter-skeleton distance as defined in the main text and appendix B, applied to the skeletons of the simulation and the 
Zel'dovich approximation (Figurc [l5(a)[ l and the skeletons of the simulation and displaced initial conditions skeleton, SZA (Figure [l5(b)| . 
The displacement fields and skeletons are computed after smoothing the field on a scale L as 3.94 such that a{L, 2 = 0) = 1, except 
for SZA, where the displacement field was obtained after smoothing over L; ?d 8.81 Mpc, such that (t(L, 2 = 0) = 0.5. The full lines 
represent the distance from the simulation's skeleton to the other, while the dotted lines represent the reciprocal distance. The thin lines 
on Figure [l 5 (a) | stand for the case where the effective smoothing introduced by the Zel'dovich approximation is taken into account. Note 
that SZA PDF is more skewed as the merging/annihilation of filaments is then not taken into account. 



tortions induced on the cosmic web. Having carried such a 
correction, we expect that the measure of the anisotropy of 
the observed skeleton length ratio could yield a good con- 
straint on the value of the cosmological parameters. Note fi- 
nally that The Zel'dovich mapping smoothed with Lcor (see 
Equation (|16p ) could be used to generate synthetically sets 
of extremely large cosmic skeletons probing exotic cosmolo- 
gies using codes such as mpgrafic (jPrunet et al.|[200^ ) to 
generate the initial conditions and their Zel'dovich displace- 
ment. This construction could then be populated with halos 
and substructure using semi analytical models. Note finally 
that the total length and the skeleton's distance are two 
probes amongst many on how to characterize the difference 
between two skeletons. Moreover, there are other means to 
quantify the evolution of the cosmic web. For instance, an 
interesting statistics would be to find out how often the re- 
connection of the skeleton occurs as a function of redshift. 



4 DISCUSSION AND PROSPECTS 

We have presented a method, based on an improved 
watershed technique, to efficicently compute the full 
hierarchy of critical subsets from a density field within 
spaces of arbitrary dimensions. Our algorithm uses a fast 
one pass probability propagation scheme that is able to 
improve significantly the quality of the segmentation by 
circumventing the discreteness of the sampling. We showed 
that, following Morse theory, a recursive segmentation of 
space yields, for a d-dimensional space, a succession of d — 1 
n-dimensional subspaces that characterize the topology of 
the density field. In 3D for cosmological matter density 
distribution, we particularly focused on the 3D subspaces 



which are the peak and void patches of the field (i.e. the 
attraction/repulsion pools) and the ID critical lines which 
trace the filaments as well as the whole primary cosmic 
web struct ure [i.e. a fully-conn ected, non-local skeleton as 
defined in iNovikov etalT (|2006l )). For the primary critical 
lines, we also demonstrated that it is possible to use the 
probabilities distribution from our algorithm to derive a 
smooth and differentiable skeleton with a sub-pixel level 
resolution. Thus this method allows us to consider the 
cosmic web as a precise physical object and makes it 
possible to compute any of its properties such as length, 
curvature, halos connectivity etc... 

As an application, we used our algorithm to study 
the evolution of the cosmic web, while comparing the time 
evolution of the skeleton (a proxy to the cosmic web) of a 
simulation, to those corresponding to different versions of its 
Zel'dovich approximation. We first compared the evolution 
of the respective lengths of the different skeletons and then 
introduced a method to compute pseudo-distances between 
different skeletons. This pseudo distance makes it possible 
to compare different features of the skeleton such as the 
size of their filaments and the similarity of their locations. 
Using these measurements, we showed that two effects were 
competing, with net result a decrease of the cosmic length 
with time: a general dilation of the larger scales filaments 
that could be captured by a simple deformation of the 
skeleton of the initial conditions on the one hand, and the 
shrinking, fusion and disappearance of the more numerous 
smaller scales filaments on the other hand. We also showed 
that a simple Zel'dovich approximation could accurately 
capture most features of the evolution of the cosmic web 
for all scales larger than a few megaparsecs (provided an 



© 0000 RAS, MNRAS 000, 000-000 



The fully connected N-D Skeleton 19 




Figure 16. A {50h~^)^ Mpc^ section of the 512'' particles simulation of a 250h~^ Mpc large box (only 1 particle every 8 is displayed). 
The purple skeleton is Ssimu{-2i L) (computed from the simulation), the green one Sza{z, Lcot) (computed on the Zel'dovich approximation 
using an effective smoothing length) and the blue one <Ssza(-2i ^nl) (computed by displacing the skeleton of the initial conditions). The 
simulation and corrected Zel'dovich approximation skeletons appear to be relatively close to each other and every individual filament has a 
counterpart in the other skeleton. The blue skeleton, computed from the skeleton Zel'dovich approximation, is more wiggly which reflects 
the small scale perturbation of the displacement field. Moreover, while many of its filaments have counterpart in the two other skeletons, 
others do not, as displacing the initial skeleton prevents the merging or disappearance of filaments. This results can be quantitatively 
measured, as shown on Figure [TSl and explained in appendix B. 



effective smoothing introduced by the approximation is 
taken into account). Hence in this context, the skeleton 
has proven to be a useful tool both for insight and as a 
quantitative probe and diagnostic. Conversely, the match 
between the simulated and the mapped skeleton confirms 
and extends geome trically the (point p rocess elliptical) 
peak patch theory (|Bond fc Mverd 1 19961 ) since both the 
peaks and their frontiers (the skeleton in 2D and the peak 
patch volumes in 3D) are well mapped by the Zel'dovich 
approximation. 



The domain of interest of the skeleton is quite vast and 
offer the prospect of a number of applications. 

From a theoretical poin t of view, using the po ints pre- 
sented in this paper and in ISousbie et al.l (|2008al ). we are 
presently developing a general theory of the sk eleton and its 
statistical properties (|Pogosvan et ahllin prep.h that aims to 
understand the properties of the critical lines of scale invari- 
ant Gaussian random fields as mathematical objects. In par- 
ticular, this companion paper provides quantitative analytic 
predictions for the length per unit volume (resp. curvature) 
of the critical lines and its scaling with the shape parameter 
of the field, and checks successfully the current algorithm 
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against these. In this paper we focussed on the skeleton. 
One could clearly investigate on the rest of the peak patch 
hierarchy and measure, say, the surface or volume of the 
(hyper)-surfaces of the recursion (whose last intersection is 
given by the primary critical lines). Another interesting is- 
sue would be to estimate the fraction of special (degenerate) 
points which do not satisfy t he Morse condition, wher e the 
fields behaves pathologically jPogosvan et ahllin prep.h . 

For instance, one of the shortcoming of the present algo- 
rithm concerns special fields where critical lines disappear, 
a situation which occurs, say, in the context of tracing den- 
drites in a neural network, or blood vessels within a liver. 
Note also that there exist other sets of (geometrical) critical 
lines that are not topological invariants such as the lines of 
steepest ascent connecting directly minima to maxima which 
are not accounted for by the present formalism. In contrast, 
the algorithm is well suited to identify bifurcation points, 
and the connectivity of the network. In particular, in an as- 
trophysical context, it would be worthwhile to make use of 
this feature and study statistically how the skeleton connects 
onto dark matter halos as a function of, say, they mass or 
spin, and investigate the details of local spin accretion in the 
context of the cosmic web superhigh ways, hence completing 
the spin alignment measurements of ISousbie et al.l (|2008al ) 
on smaller scales. More generally, the algorithm provides a 
neat bridge, via the provided connectivity, between the the- 
ory of continuous fields on the one hand, and graph theory 
for discrete networks on the other. This could prove to be 
of importance in the context of percolation theory. For in- 
stance, the percolation threshold can be explained in terms 
of the p roperties of the conn ectivity of the relevant nodes 
(see e.q. lColombi et al.1 (|2000l 'l'l. 

Here, as argued in section 2.4 we deliberately chose not to 
consider the issue of shot noise and its consequences on 
segmentation, for which no definitive solution yet exists, 
though many i mprovemen t s hav e been proposed in the lit- 
erature (see e.Q. iRoerdinkI (Il995l) '). Instead, we followed the 
approach of lSousbie et al.l (|2008al '). that simply involves con- 
volving the sampled density field with a large enough (in 
terms of sampling scale) Gaussian kernel so that the field 
can be considered smooth and differentiable; the probabilis- 
tic algorithm allows for the removal of sampling effects and 
small intensity residual shot noise. In appendix C we show 
that the corresponding fully connected skeleton is nonethe- 
less quite robust (the core of the network remains quasi 
unchanged), so long as the SNR is above one. A possible 
drawback of this method is that it introduces a smooth- 
ing scale attached to the skeleton. This is not necessarily a 
problem in cosmology as the scaling of the skeleton prop- 
erties with scale yields information on the distribution over 
these scales. Moreover, one is usually interested by the prop- 
erties of the skeleton on a given scale (typically larger than 
the halo scale, a few megaparsecs) . Nonetheless, there ex- 
ist more complex multi-scale sampli ng and smoot h ing te ch- 
niques such as the one presented in iPlaten et al.l (|2007l ) or 
IColberd (|200i1 ) that could straightforwardly be adapted to 
our implementation. All the algorithm requires is a struc- 
tured sampling grid where one can recover a one to one pixel 
neighbourhood (i.e. one needs to be able to find the neigh- 
bouring pixels of any pixel and these pixels must have the 
former as neighbour as well). For insta nce, we already im - 
plemented the algorithm for an Healpix (|G6rski et ahllioosl ) 



pixelisation of the sphere (see Figure I17p , while a direct im- 
plementation on a delaunay tesselation network is clearly an 
optiorQ. 

A natural extension of the theoretical component of this 
work would be to investigate numerically the properties of 
the bifur cation points in abstract space or anisotropic set- 
tings (see lPogosvan et al.l (|in prep.h for a theoretical discus- 
sion for isotropic Gaussian random fields) . For instance, i n 
the context of cosmic structure formation, iHanam 1 (|200lD . 
relied on the parallel between the skeleton of the density field 
in position-time 4D space and in position-scale 4D space 
to relate the two. In the former, the skeleton is a natural 
way of computing what is known as a halos merger tree, 
com monly used in semi- analytical galaxy formation models 
(see iHatton et al] (|2003l ) for instance): the skeleton traces 
the evolution of the cri tical points of the d ensity field in 
time. The peak theory (|Bond fc MversI 119961 ) tells us that 
the smoothing scale can be linked to time evolution on 
scales where gravitational effects remain weakly non-linear. 
A worthwhile goal is to establish the parallel between the 
properties of 4D skeleton in this position-smoothing scale 
space (which can be computed from the Gaussian initial 
conditions only) and the halo merger tree. Finally, note that 
classical bifurcation theory is concerned with the evolution 
of a critical point as a function of a control parameter. In 
the language of the skeleton, this evolution may correspond 
to the skeleton in the extended "phase space" . 

From the physical and observational point of view, 
an other interesting venue would be to apply the 
skeleton to actual galaxy cat alogs such as the SDSS 
lAdelman-McCarthv et aL l2008t ) to characterize the (uni- 
versal) statistics of filaments as physical objects, like halos 
or voids, and describe them in terms of their thickness, 
length, curvature and environmental properties (galaxies 
types, halo proximity, color and morphology gradient...), 
both in virtual and observed catalogs. It could also be used 
as a diagnosis tool for inverse methods which aim at re- 
constructing the three di mensional distributi on of the IGM 
from say QSO bundles (|Caucci et al.1 12008| ) or upcommg 
radio surveys (LOFAR, SKA etc..) Clearly the peak patch 
segmentation developed in this paper will also be useful in 
the context of the upcoming surveys such as the LSST, or 
the the SDSS-3 BAO surveys, for instance to identify rare 
events such as large walls or voids and study their shape. Its 
application to CMB related full sky data, such as WMAP 
or Planck should provide insight into, e.g. the level of non 
Gaussianity in these maps. Similarly, upcoming large scale 
weak lensing surveys (Dun e, SNAP...) could be analysed 
in terms of these tools (see IPichon et al.l (|in prep.[ ) for the 
validation of a reconstruction method in this context). 
Using the skeleton, the geometry of cold gas accretion 
that fuel stellar formation in the core of galaxies could be 
probed. The properties of the distribution of metals on 
smaller scales could be also investigated using peak patches, 
to see how they influence galactic properties; one could 
compare these statistical results to those obtai ned through 
WHIM detection by Oxyg en emission lines (|Aracil et al.l 
( 20041 ) ICaucci et all (120081')') . Indeed it has been claimed 



(see e.g. lOcvirk et al. I (|2008h iDekel et al.1 (|2008l ')') that 



for instance to segment regions on the surface of skull 
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the geometry of the cosmic inflow on a galaxy (its mass, 
temperature and entropy distribution, the connectivity of 
the local filaments network etc. ) is strongly correlated to 
its history and nature. 

In closing, let us emphasize again that the scope of ap- 
plication of the algorithm presented in this paper extends 
well beyond the context of the large scale structure of the 
universe: it could be used in any scientific of engineering 
context (medical tomography, geophysics, drilling ...) where 
the geometrical structure of a given field needs to be char- 
acterized. 
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Figure 17. from left to right: the 5 year WMAP release of the CMB temperature map, the corresponding peak patches and the peak 
patches of the same field smoothed over a FHWM of 420 arcmin. Different colours represent different patches. The algorithm described 
in section [2] is implemented here on the healpix pixelisation. 
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Figure Al. Illustration in the 2D case of the recursive minimization algorithm, applied to the case of Figure [5(b)| The reader can refer 
to the legend of Figure [5] for more details. The scalar field to minimize is represented by the blue shading in the background while its 
minimum is located at the intersection of the 3 quadrics. The red crosses locate the field minima along the edges while the red, blue and 
black sets of lines result from the first three recursion steps. 



APPENDIX A: A GENERIC MINIMIZATION ALGORITHM 

In this appendix, we present a generic algorithm that aims at minimizing a multi-hnear scalar function f{xi,..,Xd) of d 
variables within a polygonal volume, in a d-dimensional space, by reducing the problem to finding the respective minima of a 
set of polynomials of order d. It takes as input the location of the minima, Aff , of f{xi, ..,2:^) on the edges of the square and 
simply consists in recursively minimizing the value of f{xi, ..,Xd) along the lines joining them. 

Let us first consider the 2D case illustrated by Figure lAll where the cell is a square. In this case, three minima, 
and M3 (represented by red crosses) can be easily found on the edges of the square from the linearly interpolated value of / 
along them. One can then compute the location of the minima along the three lines linking them (the red triangle), noting 
that because of the multi-linearity of /, its value along a line can be expressed as a second order polynomial. One thus obtains 
3 new points. Ml, M2 and M3, and the process can be repeated, as represented by the blue and black sets of lines, until 
convergence to the solution, represented by the blue cross (i.e. when the three points are close enough to each other). 

This algorithm can be generalized to the case of the p-face of an n-cubic cell, p ^ n, thus providing the solution over the 
j9-face from the k solutions, Mf i € {1, .., k} , over the sets of (p — l)-faces that are its edges. As explained in section [2.3.31 
this algorithm is thus recursively applied to the edges of the cell, starting from the 1-faces, in the order of their increasing 
dimensionality. The j"* step of the algorithm thus goes as follows: 

(i) Compute the equations of the {k){k — l)/2 lines joining pairs of M^^^ . 

(ii) Evaluate the value of f{xi, ..,Xd) at p+ 1 points along these lines using multi-linear interpolation, and fit a polynomial 
of order p. 

(iii) Find the minima of these polynomials that belong to the cell and keep the k lowest among them, with coordinates Af,^ . 

(iv) If these points are all contained in a sphere of radius a given fraction of the cell, stop, else start over. 

Note that although only the case of a Cartesian sampling grid was presented here, the algorithm is easily transposable to 
any type of grid, such as the one produced by Voronoi tessellation on a manifold, which is composed of simplex shape cells. 



APPENDIX B: INTER-SKELETON PSEUDO-DISTANCE 

The inter-skeleton pseudo-distance from one skeleton Sa to another skeleton St was defined in the main text by the probability 
distribution function (PDF) of the minimum of the distance from each segment of 5a to any segments of Sb- In this appendix, 
we show how this measure can be interpreted using realizations of scale invariant Gaussian random fields (GRFs) with 
different power spectrum index n (such that P (k) cx fc~") and different smoothing lengths L. All the skeletons that we use 
were computed from 512^ pixels realizations of GRFs, smoothed over a scale L = 8 pixels or Li, = 16 pixels. These scales 
are defined as the width of the Gaussian kernel that we used to smooth the fields and the value of L roughly corresponds, in 
number of pixels, to the smoothing scale we used in the main text, Lnl. A total of six different skeletons were computed: 
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Figure Bl. Measures of the inter-skeleton 
smoothing length L. These plots show how 
skeletons. 



pseudo distances for Gaussian random fields with different power spectrum index n and 
the pseudo-distances measurements can be used to assess the discrepancies between two 



• iSgrfo and <Sgrfo': skeletons computed from two realizations (GRFO and GRFO') of GRFs with spectral index n = 0, 
smoothed over a scale L = 8 pixels. 

• 5grf3 and iSgrfs': skeletons computed from two realizations (GRFS and GRF3') of GRFs with spectral index n = 3, 
smoothed over a scale L — 8 pixels. 

• 5grfot' this skeleton was computed from the field GRFO, smoothed on scale L. The resulting skeleton was then 
translated by v = (L/2, 0, 0). 

• 5grfol: this skeleton was computed from the field GRFO, smoothed on scale Ll — 2L = 16 pixels. 



Figure iBll presents the different pseudo-distances between these skeletons, D(a, b). Figures |l(a)| and |l(b)| present 
the results obtained when comparing uncorrelated fields (i.e. different realizations of GRFs). As expected in that case, 
©(GRFO, GRFO') = ©(GRFO', GRFO) and ©(GRF3, GRF3') = I?(GRF3', GRF3) and the position of the mode is about the 
smoothing length. One should also note that the mode intensity differs between n = and n — 3, which can be explained by 
the fact that in the latter case, small scale fluctuations are suppressed together with smaller scale filaments, thus making 
it less probable for a segment of one realization to be very close to one of the other realization. Figure 1(c) shows that 
these pseudo distance measurements make it possible to distinguish the different nature of two skeletons. In fact, whereas 
5grfo has filaments on any scales, only the larger scales are present in 5grf3, which translates into an asymmetry between 
©(GRFO, GRF3) and ©(GRF3, GRFO). Whereas in the first case, there is no reason why every segment of 5grfo should be 
close to a segment of 5grf3, the reciprocal is not true ; 5grfo spreads on all scales and every segment of 5grf3 should be as 
close Eis any other from a segment of 5grfo (hence the higher intensity of the mode for ©(GRF3, GRFO)). When comparing 
a skeleton 5a with less filaments to a skeleton 5b with more filaments, the intensity of the mode is thus expected to be higher 
for ©(a, b) than for ©(b, a). 
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Figure CI. The evolution of the PDF of distances at the mode as a function of the SNR of a noisy field. Here the distance is computed 
between the reference skeleton and its noisy counter part. For SNR above one, only small differences between weak filaments account for 
the difference between the two distances. Conversely, for more noisy fields, the fraction of match between the two skeleton drops. 



This observation is confirmed by Figure 1(e) where 5grfOl is compared to 5grfo, which has smaU scale filaments that 



5grfol does not have. But in that case, the two skeletons are correlated as only the smoothing length changes. This results in 
a higher intensity of the mode of ©(GRFOl, GRFO): the larger scale filaments are present in both skeletons. It also results in 
a shift in the position of the mode, located at a distance smaller than the smoothing length. Figure 1(d) illustrates the case of 



a simple translation of length half the smoothing length L: in that case, both PDFs are identical and a very asymmetric and 
high intensity mode is present at distance L/2. Finally, it is also interesting to note that the comparison of 5grfol to iSgrfot 
almost gives the exact same result as the one for 5grfol to 5grfo and it is difficult to distinguish one from the other. 



APPENDIX C: ROBUSTNESS OF FULLY CONNECTED SKELETON 

In order to investigate the robustness of the fully connected skeleton with respect to small changes in the underlying field, the 
following experiment is carried. A given 2D white random field of size 4096^ is generated. It is then smoothed over 10 pixels, 
and its reference skeleton, Sref is computed. A white random field of amplitude SNR is added to the reference field, and the 
corresponding skeleton, 5snr, is computed after smoothing over 10 pixels. The PDF of the pseudo- distances I>(5ref , 5snr) and 
T>{SsNR, Srci) is then calculated (see Appendix B). The distance at the maximum (its mode) of both PDF remains unchanged 
for all the SNR considered (1/8, 1/4, 1/2, 1,2, 4), which demonstrates that the core of the skeleton is quite robust: the reference 
skeleton is always shadowed by its noisy counter part. The amplitude of the PDF at its maximum is plotted in Figure ICll 
This amplitude is sensitive to the high distance tail of mismatch between the two skeleton since the PDF is normalized. In 
short, within the network there is a small subset of filaments which are sensitive to any small variation of the field. For the 
vast majority of the network, the skeleton is globally only weakly affected by changes of the underlying field so long as the 
amplitude of the change has a SNR above one. When the SNR drops bellow one, spurious filaments occur more and more. The 
discrepancy between the two plateaux at larger SNR refiects the fact that weaker filaments will occur somewhat randomly 
from one realisation to another, depending on very small details in the field. 
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